Automatic Documents Annotation by Keyphrase Extraction in Digital Libraries using Taxonomy

نویسندگان

  • Iram Fatima
  • Asad Masood Khattak
  • Young-Koo Lee
  • Sungyoung Lee
چکیده

Keyphrases are useful for variety of purposes including: text clustering, classification, content-based retrieval, and automatic text summarization. A small amount of documents have author-assigned keyphrases. Manual assignment of the keyphrases to existing documents is a tedious task, therefore, automatic keyphrase extraction has been extensively used to organize documents. Existing automatic keyphrase extraction algorithms are limited in assigning semantically relevant keyphrases to documents. In this paper we have proposed a methodology to assign keyphrases to digital documents. Our approach exploits semantic relationships and hierarchical structure of the classification scheme to filter out irrelevant keyphrases suggested by Keyphrase Extraction Algorithm (KEA++). Experiments demonstrate that the refinement improves the precision of extracted keyphrases from 0.19% to 0.38% while maintains the same recall.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bringing Order to Digital Libraries: From Keyphrase Extraction to Index Term Assignment

Collections of topically related documents held by digital libraries are valuable resources for users; however, as collections grow, it becomes more difficult to search them for specific information. Structure needs to be introduced to facilitate searching. Assigning index terms is helpful, but it is a tedious task even for professional indexers, requiring knowledge about the collection in gene...

متن کامل

Human Evaluation of Kea, an Automatic Keyphrasing System

This paper describes an evaluation of the Kea automatic keyphrase extraction algorithm. Tools that automatically identify keyphrases are desirable because document keyphrases have numerous applications in digital library systems, but are costly and time consuming to manually assign. Keyphrase extraction algorithms are usually evaluated by comparison to author-specified keywords, but this method...

متن کامل

Fuzzy Neighbor Voting for Automatic Image Annotation

With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...

متن کامل

Semantically Enhanced Automatic Keyphrase Indexing

The goal of this PhD thesis is to elaborate methods for automatic keyphrase indexing with a controlled vocabulary. Keyphrases are single words or multi-word lexemes that concisely and accurately describe the subject or an aspect of the subject discussed in a document. They are widely used in large document collections such as digital libraries and document repositories. They help organize mater...

متن کامل

Domain-independent automatic keyphrase indexing with small training sets

Keyphrases are widely used in both physical and digital libraries as a brief but precise summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive, because trained human indexers must reach an understanding of the document and select appropriate descriptors according to define...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011